We then tested whether this “impossible typo effect” holds for Haiku and Opus on other benchmarks. We chose BBH and GPQA since Haiku struggles reasonably without introducing typos. Here, we no longer observed the impossible typo effect, and Haiku’s capabilities decreased with typo rates.
this shouild be 1 figure, not 2
and i think this couild be combined with the previous section as well
this shouild be 1 figure, not 2
and i think this couild be combined with the previous section as well